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ABSTRACT 

The search of a clustering signal in the arrival directions of ultra-high-energy cosmic rays (UHECRs) is a standard method to assess 
the level of anisotropy of the data sets under investigation. Here, we first show how to quantify the sensitivity of a UHECR detector to 
the detection of anisotropy, and then propose a new method that pushes forward the study of the two-point auto-correlation function, 
enabling one to put astrophysically meaningful constraints on both the effective UHECR source density and the angular deflections 
that these charged particles suff'er while they propagate through the galactic and intergalactic magnetic fields. We apply the method 
to simulated data sets obtained under various astrophysical conditions, and show how the input model parameters can be estimated 
through our analysis, introducing the notion of "clustering similarity" (between data sets), to which we give a precise statistical 
meaning. We also study how the constraining power of the method is influenced by the size of the data set under investigation, the 
minimum energy of the UHECRs to which it is applied, and a prior assumption about the underlying source distribution. We also 
show that this method is particularly adapted to data sets consisting of a few tens to a few hundreds of events, which corresponds to 
the current and near-future observational situation in the field of UHECRs. 
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1. Introduction 

Despite considerable experimental and theoretical efforts, the 
nature and origin of ultra-high-energy cosmic rays (UHECRs) 
remain largely unknown (e.g. Kotera & Olinto, 2011; Allard, 
2011; Abreu et al., 2011a,b; Tokuno et al., 2011; Sokolsky et 
al., 2010). It is believed that a detailed account of the processes 
responsible for the generation of what seems to be the highest 
energy particles in the universe would represent an important 
step towards the understanding of non-thermal cosmic sources 
in general, through the key process of particle acceleration in as- 
trophysical environments (see e.g. AUard & Protheroe, 2009 and 
Hillas, 2006). 

UHECRs are characterized by a very low flux, of the order 
of 1 per km^ per century, and by a natural energy scale associ- 
ated with their interaction with the intergalactic photons (notably 
from the cosmological microwave background): the GZK scale, 
around 610'^ eV. Above this energy, UHECR protons and nuclei 
lose energy through photo-pion production and photodissocia- 
tion, respectively, so they cannot propagate over large distances. 
This results in the appearance of a "GZK" horizon beyond which 
UHECRs cannot be visible from Earth. At 6 10'^ eV, the horizon 
distance is of the order of 200 Mpc, and it is rapidly decreasing 
above this energy. As a consequence, the number of sources ef- 
fectively observable, and thus the UHECR flux actually observed 
is predicted to decrease sharply above this energy (see Greisen, 
1966 and Zatsepin, 1966), in agreement with the observations of 
HiRes (Sokolsky et al., 2009) and Auger (Abraham et al., 2010; 
Abreu Retal., 2011a). 

While it makes the observation of hundreds of UHECRs a 
very difficult experimental challenge (Kajino et al., 2010), this 
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so-called "GZK effect" can actually be regarded as a helping 
hand offered by Nature for the identification of UHECR sources. 
Indeed, it comes to alleviate the problem caused by the deflection 
of cosmic rays by the intervening magnetic fields (Galactic and 
extragalactic), as its main consequence is to reduce the number 
of contributing sources, hence to increase their angular separa- 
tion over the sky. Now, if the deflections are smaller than the typ- 
ical angular distance between the main UHECR sources (which 
is also more likely to happen at the highest energies, where the 
magnetic rigidity is larger), localized clusters of events can be 
expected to be identified as soon as the integrated exposure of the 
UHECR experiments becomes large enough for several events to 
be contributed by single sources. 

This suggests that the analysis of UHECR clustering over the 
sky can be used to constrain the number of sources, or equiva- 
lently the source density, together with the effective deflection of 
UHECRs from their sources to the Earth. These two parameters 
are particularly important to understand the phenomenology of 
UHECRs, and eventually identify their sources. On one hand, 
the typical angular size of the deflections provides clues about 
the charge of the UHECR particles and/or the strength of the 
intervening magnetic fields. On the other hand, narrowing the 
range of possible source densities would directly lead to the re- 
jection of some source models, notably if the number density of a 
given type of sources in the nearby universe is smaller than actu- 
ally required by the data. In addition, an estimate of the UHECR 
source density would immediately reflect into an estimate of the 
individual source power, since the total energy input in the form 
of UHECRs is known from the measured flux. In turn, knowing 
the individual source power would provide important clues to- 
wards the possible sources (which must obviously be at least as 
powerful) as well as the acceleration efficiency (i.e. the fraction 
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of energy imparted to cosmic rays), if the total source power is 
known or estimated by other means. 

Observations of the clustering signal in the UHECR data 
from AGASA triggered several studies essentially designed to 
constrain the local density of sources (e.g. see Dubovsky et al., 
2000) along with constraints on the source luminosity (Blasi & 
De Marco, 2004). The clustering measurement brought by the 
Pierre Auger Observatory (Abreu et al., 2010; Abraham et al., 
2007) put on more solid grounds the power of such studies, and 
the analysis of the first Auger data lead to stronger constraints 
on the density of sources (see Cuoco et al., 2008 and Abreu P. et 
al., 2011b) by using the autocorrelation function and performing 
a scan in energies and separation angles. An update of this re- 
sult was recently presented (see De Domenico et al. in Abreu P. 
et al., 2011b), which notably constrained the density of sources 
assumed to be distributed like the galaxies. In this context, our 
study adds a key parameter of the phenomenology of UHECRs 
and put an emphasis on the joint constraints that can be put on 
the source density and the deflection. 

Recently, the Auger Collaboration rejected the hypothe- 
sis that the UHECR sky was isotropic at a 99% Confidence 
Level (Abraham et al., 2007): despite its limited Confidence 
Level, this result represents an expected and important milestone 
in the field. It was obtained by demonstrating that the arrival 
directions of UHECRs have a larger probability to he within a 
given subpart of the sky (originally built as the region within 3 
degrees of an AGN reported in a given catalog) than if they were 
isotropic. While this provided no clue about the actual UHECR 
sources, nor about their actual deflections by magnetic fields, it 
did show that the deflections are not large enough to spoil, in 
the observed sky, what may be inferred as an anisotropic source 
distribution. Likewise, the two-point correlation function of the 
Auger events reveals an excess of clustering at some angular 
scale, compared to isotropic expectations, with an overall sig- 
nificance of the order of 1% (Abraham et al., 2008). An update 
of this result was released in 2010 (see Abreu et al., 2010). 

In this paper, we investigate whether this kind of clustering 
analyses can be developed to do more than testing the isotropy 
hypothesis or setting limits on the sole density of sources, and 
actually derive constraints within the 2D parameter space that is 
most relevant from the astrophysical point of view, namely the 
space of UHECR source density and angular deflections, which 
we call here in short the (ns,^)-space. 

2. Principle of the analysis 

The idea behind the proposed study is that the "level of cluster- 
ing" of a given data set (a notion to be understood in an intu- 
itive way for the moment) depends on the number of sources 
contributing, and on the amplitude of the random deflections 
which spread the UHECRs around a central position, resulting 
in less dense clusters and an possible overlap of nearby sources. 
Qualitatively, if the local universe (within the GZK horizon) con- 
tains only a few sources, and the magnetic deflections are small, 
the data set will show a high level of clustering on small an- 
gular scale (Mollerach et al., 2009). On the other hand, if there 
are many sources all over the sky and large deflections, data sets 
should not show a significant level of clustering at any scale. An 
intermediate situation would be that of a low source density, with 
only a few sources within the GZK horizon, but relatively large 
deflections. In that case, one may also expect some significant 
clustering, although al a larger angular scale. 

The analysis proposed here consists in comparing the level 
of clustering of a data set of UHECRs with the level of clustering 



of simulated data sets of the same size that would be produced by 
a range of astrophysical models characterized by a given source 
density, n^, and a given deflection scale, 6. Models that have a 
very low probability to provide either "as much clustering" or 
"as little clustering" as the data set under investigation will then 
be declared incompatible with the data set (with the correspond- 
ing confidence level). 

This analysis, however, depends on an assumption regard- 
ing the global source distribution underlying the simulated uni- 
verses. Indeed, the level of clustering of a data set may also re- 
flect the level of clustering of the sources themselves. For in- 
stance, a large density of sources distributed uniformly over the 
sky will produce a data set with no significant clustering, what- 
ever the deflection scale, while a data set built from sources 
distributed non-homogeneously will be "more clustered" than 
a random, uniform data set, because the sources are. For this 
reason, we investigate here two different cases, drawing sources 
randomly (with the density under investigation): i) from a per- 
fectly uniform distribution (sources are equally likely to be 
found anywhere in the Universe), ii) from a distribution match- 
ing that of the galaxies in the nearby universe (in a similar way 
as in Abreu P. et al., 2011b). The former case allows us to ex- 
plore the main characteristics of this kind of analyses, in a gen- 
eral, "neutral" situation. The latter case is more relevant to the 
current UHECR phenomenology favoring bottom-up scenarios, 
where the sources are presumably distributed like matter itself, 
whether they are a class of galaxies (e.g. active galaxies, classes 
of AGNs...), a roughly random subset of galaxies (e.g. gamma- 
ray burst scenarios, young pulsars...), or a subset of clusters of 
galaxies (e.g. cluster-scale shocks...) 

3. Simplifying assumptions 

The deflection of a charged UHECR from its source to the Earth 

is governed by its interaction with the intervening magnetic field, 
both Galactic and extragalactic. Particles with the same mag- 
netic rigidity, i.e. the same energy-to-charge ratio, EjZ, follow 
identical paths in the universe, while the overall deflections are 
essentially proportional to Z and inversely proportional to E. 
UHECRs from difl'erent sources in the sky should be deflected 
in different ways, because they folUow different paths in the 
Galaxy. 

However, the amphtude and geometrical structure (orienta- 
tion, coherence length, filling factor) of the magnetic fields in 
the universe are unfortunately not known at the moment, even 
grossly (Beck, 2011). Therefore, in this exploratory paper, we 
do not attempt a detailed treatment of the UHECR deflections, 
which would be strongly model-dependent anyway, and con- 
sider a simplified model in which the deflections are represented 
by a single parameter, 6, which represents the "angular size" of 
the overall deflection, for any cosmic ray above a given energy. 
We ascribe to this angular parameter 6 the following operational 
meaning: the event directions are drawn from a 2D gaussian dis- 
tribution centered at the position of the source and a spread such 
that the average deflexion angle is 6. 

Note that, in reality, the distribution of UHECRs from a 
given source are expected to be somewhat elongated along 
the direction of the average regular magnetic field towards the 
source, with the highest rigidity particles being deflected the 
least. However, since the UHECR spectrum is steep, most of the 
UHECRs received form a given source above a given energy 
have essentially that energy and should thus experience similar 
deflections (except if they have different charges). For the same 
reason, we shall use a unique angular parameter 6 for all events. 
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whatever their energy above a given threshold (although 5 does, 
of course, depend on that threshold). It would of course be easy 
to scale the deflection with the inverse of the rigidity, but that 
would not change our results significantly here: what we are in- 
terested in is a constraint on an effective "deflection scale", av- 
eraged over all UHECRs above a given energy, from all regions 
of the sky. More refined models can easily be adapted for the 
needs of future analyses with specific data sets and specific mag- 
netic field and composition models. It may also be objected that, 
while the random (turbulent) magnetic field may be expected to 
essentially "broaden" the source into a roughly Gaussian distri- 
bution of UHECRs over the scale 6, the regular magnetic field, 
mostly in the Galaxy, tends to shift the position of the centroid of 
this distribution away from the actual source position. However, 
since we shall draw the positions of the sources randomly (with 
a given source density and according to a given probability func- 
tion), the actual position of the centroids of the resulting distri- 
butions of events are not relevant (unless the shifts are larger 
than the typical autocorrelation scale of the source probability 
distribution), and we can simply ignore such shifts for the global 
clustering analysis proposed here. 

Finally, we assume that the deflections are independent of 
the source distance, which essentially comes down to assuming 
that they are dominated by the influence of the Galactic magnetic 
field, which appears reasonable given the current understanding 
of the extragalactic magnetic field (e.g. Ensslin, 2005). However, 
just as with the above simplifications, it is easy, in principle, to 
include any specific law for the deflection of UHECRs from any 
given source when building our test data sets, at the cost of addi- 
tional free parameters or model assumptions (which we believe 
unnecessary in the current stage of understanding of the UHECR 
phenomenology) . 

Another assumption concerns the luminosity of the differ- 
ent sources. Here, we simply assume that all sources have the 
same intrinsic luminosity. This may apply if the main sources 
are "candle events" (such as, possibly, GRBs (Murase et al., 
2008)), but it is almost certainly not the case if AGNs are the 
main sources. However, in the absence of a favored model for 
the UHECR sources, we chose not to add another set of free 
parameters to describe the luminosity distribution. The source 
density that the proposed analysis may favor with a real data 
set should then be interpreted as an effective source density, that 
is the density which leads to a similar "clustering signal" if all 
sources were intrinsically similar. This value remains an inter- 
esting parameter to constrain UHECR source models. We sim- 
ply note here that data sets built with sources having a range 
of luminosities generically provide a larger "clustering signal" 
than data sets from identical sources with the same cosmic dis- 
tribution and density. This is because the flux is then likely to 
be dominated by a source which is unusually close and intense, 
providing (on average) a larger number of UHECRs than in the 
case of perfectly identical sources. 

4. Data sets and clustering signal 

Our goal is to compare the level of clustering of a reference data 
set produced by a given astrophysical model, with the level of 
clustering of some tested data sets. A given astrophysical model 
is characterized here by a set of assumptions concerning: 

- the probability distribution of the source locations: either 
uniform or proportional to the matter density in the universe; 

- the source density, n^: we explore densities ranging from 
10"^ to lO""' per Mpc-*, the lower limit being set by the re- 



quirement that there are at least a few sources within the 
GZK horizoiQ and the upper limit by the observation that 
there are no visible sources in our Galaxy or local group; 
- the angular scale, 6, of the deflections - which we leave as 
a free parameter ranging from 1 degree (the typical angular 
resolution of UHECR experiments) to 90 degrees. 

For each model, we simulate data sets with a given number 
of events, A^evt> in the following way. First, we draw the posi- 
tion of the sources in the universe as a random realization of 
the underlying source distribution, with source density «s. Then, 
we propagate the cosmic rays injected by these sources in the 
intergalactic medium with a power-law spectrum in E^^, as re- 
quired to fit the UHECR data. Here, for definiteness, we assume 
pure protons sources with a maximum energy fimax - 10^"'^ eV, 
but we found that the results are essentially independent of x 
and Zimax- We set a minimum energy fimin and draw A^evt events 
above that energy from the propagated flux at Earth, taking into 
account the detector acceptance. Here, we assume the same cov- 
erage map as that of the Pierre Auger Observatory. Finally, for 
each of the A^evt events, we choose its location at an angular dis- 
tance a from its actual source, where a is drawn randomly from 
a 2D-Gaussian distribution with variance 6^ (see Sect. [sjl. 

For each data set built in this way, we count the number of 
pairs of events, C{d), separated by an angular distance smaller 
than 0, for values ranging from 1° to 90°, by increments of 1°. 
These numbers are then compared with the numbers expected 
for data sets of the same size, built from tested models under 
similar conditions. The latter can be isotropic (Sect. |5]l or corre- 
spond to generic astrophysical models of interest (Sect.|6]l. 

5. Sensitivity of UHECR detectors to anisotropic 
astrophysical models 

The standard use of the clustering analysis consists in investigat- 
ing the anisotropy of a given data set, denoted by £)(), by com- 
paring its values of C{d) with the expectations of tested data sets 
obtained under the assumption of an isotropic flux (weighted by 
the detector acceptance, see Finley et al., 2004). To this aim, we 
first build the distributions of the C{9) from 10^ realizations of 
isotropy-based data sets. Fig. [T] shows three examples of these 
distributions for data sets of 60 events above Ej„i,^ = 60 EeV. 

Then, for each value of 6, we extract the fraction, V+iff), 
of these "isotropic" data sets (the quotation marks remind one 
that the detector acceptance is actually taken into account) that 
are "more clustered" than the data set at angular scale 9, by 
which we mean that they have as many or more pairs of events 
within than £)(). Marginalizing over 9, we then determine the 
minimum of this function, which occurs at some angular scale 
which we do not use in the current analysis. This minimum prob- 
ability, !P+,min = niin{!P+(0)), is an intermediate number, which 

G 

measures the level of clustering of data set £)(>. To turn this num- 
ber into a real probability that an isotropic data set be as much (or 
more) clustered as Dq, we finally need to complete the marginal- 
ization procedure (see Finley et al., 2004), answering the follow- 
ing question: "what is the probability that a purely random data 
set (in the above sense) has a value of !P+,min that is lower than 
that found with data set ©o?" (whatever the value of 9 at which 
this minimum occurs). For this, we apply the same procedure 

' Due to energy losses of the protons, the sources at a greater distance 
than ~200 Mpc can not contribute to more than a few percent to the 
UHECR flux above 60 EeV. This efl'ect is the so-called "GZK horizon" 
and allows us to define a maximum distance for the sources. 
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Fig. 1. Distributions of the number of pairs of events, C{0), 
within angular distance 6 - 1°, 10° and 30°, for data sets con- 
taining 60 events isotropically distributed, assuming a geometri- 
cal acceptance identical to that of the Pierre Auger Observatory 



to another large number of isotropic data sets. This gives us the 
distribution of !P+,min values shown in Fig. |2j from which we get 
the fraction of these data sets which have a lower !P+,min- This is 
the so-called p-value, which we denote as !P+: it gives the prob- 
ability that an isotropy-based data set be "more clustered" than 
the data set under investigation, £)o, at some unspecified angular 
scale. 

The usual use of this procedure is to investigate the signifi- 
cance of the anisotropy of a given data set. It can also be used to 
determine a priori the sensitivity of a given UHECR experiment 
to detect anisotropy, for different astrophysical models. This is 
illustrated in Figs. [3]and|4] where we show the mean value of 
!P+ for 100 data sets built from astrophysical models with source 




Fig. 2. Distribution of the P+,min values for "isotropic" data sets 
of 60 events (actually weighted by the coverage map of the Pierre 
Auger Observatory). 



density n,, shown in abscissa, and deflection scale 6, shown in 
ordinate. In Fig. [3] we assume an a priori uniform source dis- 
tribution in the universe, and show the sensitivity of a detector 
with the same coverage map as Auger, for data sets of 60, 100, 
300 and 600 events above 60 EeV. As expected, the lower-left 
part of the plot shows a very low p-value. This is because mod- 
els with few sources and small deflections will lead to strongly 
clustered data sets, where several events from the same source 
gather within a short angular scale on the sky. It is thus very un- 
likely that an actually isotropic sky leads to the same level of 
clustering. Going from left to right on the plots, the source den- 
sity increases, so there are on average fewer and fewer events per 
source (for a given size of the data set), and the clustering sig- 
nal becomes smaller and smaller, even for a low deflection scale. 
In the limit where there is never more than one event per source, 
the observed clustering of the events reflects the clustering of the 
sources themselves. Going from bottom to top, the angular de- 
flection scale increases, so the clustering of events from the same 
source is blurred over larger scales, and if the sources are nu- 
merous enough (large source density), their individual "image" 
on the sky will overlap, making the data set more and more sim- 
ilar to a data set drawn from an isotropic flux. This explains the 
large p-value found in the upper-right part of the plots. Finally, 
for larger data sets, there will be on average more events per 
source (whatever the source density), so the intrinsic clustering 
of the UHECR events will be larger, compared to the isotropic 
expectations. This is why the low p-value regions are basically 
moved towards the upper-right corner as the data set becomes 
larger, as seen on the figures. 

These plots can be regarded as indicating the range of as- 
trophysical models for which the detector is sensitive enough 
to detect a significant anisotropy in data sets of a given size, as 
represented by the blue-green part of the parameter space. For 
instance, if a detector can collect 100 events above 60 EeV, it 
can be expected to assess the anisotropy of the UHECR sky at 
the 99.9% confidence level (C.L.) for models with a source den- 
sity lower than 2 10"^ Mpc~^ and a deflection scale of the order 
of 1° or models with a source density around 10"* Mpc"^ and a 
deflection scale smaller than 15°. The sensitivity is clearly im- 
proved for detectors able to collect 600 events above the same 
energy (see below for further comments). 
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Fig. 3. Mean value of logCP,) for 100 realizations of reference ^'S- 4- Mean value of log(;P,) for 100 realizations of reference 

data sets with parameters (effective density of sources, in ab- ^ata sets with parameters and S (see Figj3] for axis defini- 

scissa) and 6 (effective deflections, in ordinate). Four sizes of ^i™)' F"^"" ^'^^^ °f '^^t^ s^^^ considered: (downwards) 

the data sets are considered: (downwards) 60, 100, 300 and 600 ^0, 100, 300 and 600 events. All data sets are built from an a pri- 

events. All data sets ai-e built from an a priori uniform source anisotropic source distribution, following that of the nearby 

distribution. The tested model corresponds to the isotropy hy- g.^l^^^^^- The tested model corresponds to the isotropy hypothe- 
pothesis. 
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Fig. 5. Distribution of the values of !P±,min for 3000 data sets of 
60 UHECR events above 60 EeV, obtained with the following 
test model: «s - lO^'^Mpc"^, S - 15°, and a uniform a priori 
source distribution. 



In Fig.|4j we show the same results for the hypothesis of an a 
priori source distribution similar to the distribution of the galax- 
ies in the local universe, taken as a subsample of the 2MRS cata- 
log (Huchra et al., 2011). The maximum distance of the sources 
is ~200 Mpc, taken as the GZK horizon for 60-EeV-protons (see 
above). Moreover, the subsample completeness is ensured by 
cutting out sources with a magnitude in the K-band lower than 
11. 2^ As is shown in Fig.[4j the same trends as in the hypothe- 
sis of a distribution are observed. The main difference is that the 
models assuming small UHECR deflections appear incompati- 
ble with isotropic expectations even in the case of large source 
densities (lower-right part of the figures). The reason for that is 
that, even when there is only one event per source (small data 
sets), the clustering of the sources themselves is apparent in the 
resulting data sets, producing a significant clustering signal that 
isotropic data sets can only exceptionally exhibit by chance. 

This result is interesting because it shows that detectors with 
moderate acceptance can detect a significant anisotropy signal in 
a large number of astrophysical cases, or, failing to detect this, 
constrain a large class of astrophysical models, typically reject- 
ing the blue area. Indeed, the data sets that a detector can collect 
if the actual source density and cosmic ray deflections corre- 
spond to a point in this region will generally be more clustered 
than 99.9% of the isotropy-based data sets. 

For example, let us consider a data set of A^evt events above 
60 EeV that would show an auto-correlation signal at the level 
of 1%. From the point of view of this clustering analysis, such a 
data set would thus be similar to the data sets collected from the 
astrophysical models corresponding to the green area in the fig- 
ures. Note that this does not mean that the actual universe must 
be one of these models. One can only say that the data set col- 
lected from the actual universe is similar to the typical data sets 
expected for these models, which could occasionally be obtained 
from other types of models. So to improve the analysis, it would 
be interesting to investigate the probability that a given type of 
astrophysical models lead to a data set similar to the data set un- 
der investigation (from the point of view of its auto-correlation). 
This is addressed in the next section. 



^ These cuts lead to an intrinsic density of the subsample of ~ lO"^^ "* 
Mpc"'. Due to limited CPU time, we limited the scan of the den- 
sity/deflection space to deflections of up to 10"'* Mpc"'. 



6. Deriving constraints on astrophysical models 

6. 1. Comparing data with classes of models 

In order to derive astrophysically meaningful constraints from 
the clustering analysis of a given reference data set, ©o, we now 
wish to compare its clustering signal with that of tested mod- 
els which are no longer isotropic, but span the whole parameter 
space (Hs, 6 and the type of source distribution). We shall thus 
propose a general test of a given astrophysical hypothesis, to be 
tested on the data set Do- 

The data set Do has two opposite ways to differ from the 
expectations of a given astrophysical model, A^test^ it can "more 
clustered" or "less clustered" than the typical data sets produced 
by this model. To quantify this potential discrepancy, we com- 
pute for each value of the distributions of C{0), defined in 
Sect. [4] for 10'* independent realizations of data sets of the same 
size as Do, built within model Altest- These distributions are sim- 
ilar to those shown in Fig.[T] We then determine, for each 6, the 
fraction V+iO), resp. V-iff), of tested data sets that have higher, 
resp. lower, values of C{9) than data set Do- From these, we ob- 
tain the angular scale where Do appears most unlike the typical 
tested models, whether because it is more clustered or less clus- 
tered. We note the corresponding fraction 'P±^mia'- 

!P±,mm = min{;P+(0),^P-(0)). (1) 

e 

Finally, we proceed as in the previous section and determine 
which fraction of another large sample of data sets built within 
the same model Attest have a smaller value of !P± min than the one 
obtained with Do- This number is noted V-t, and for simplicity, 
we call it the clustering similarity of data set Do with respect to 
model Altest (see below for its precise meaning). 

As an example, Fig.|5]shows the distribution of the values of 
'P±,mm for 3000 data sets of 60 UHECR events above 60 EeV, 
obtained with the following test model: - 10"'*Mpc"^, 5 - 
15°, uniform a priori source distribution. Then !P-t is simply the 
fraction of the data sets that appear on the left side of the value 
of 'P±,m\n obtained for the data set Do in this histogram: 

/-Pi.minOo) 

:P±= /(:P±,m,n)dy'±,m,„, (2) 

U —00 

where /(!P±,min) is the function that represents the normalized 
histogram. 

So is the probability that the test model. Attest, leads to a 
data set as dissimilar or more dissimilar to the average cluster- 
ing features corresponding to this model than the data set under 
investigation, Do- 
To illustrate how this method can be used on a real data 
set, we choose a particular astrophysical model in the parameter 
space and generate from this model one data set. Do, to be an- 
alyzed through its clustering properties. To generate Fig. |6j we 
started with a realization of an a priori uniform source model, 
with a source density - IGr^ Mpc"^ and a deflection angular 
scale 6 - 5° ,2& indicated by the white cross on the plot. We then 
propagated cosmic rays from this realization of the sources to 
build a data set of A^evt - 50 events above fimin - 60 EeV, which 
is our reference data set Do- 

The colors in Fig. |6] indicate the value of for Do- The 
region of the parameter space appearing in blue corresponds to 
models which have a low probability to yield data sets with clus- 
tering properties similar to that of Do (namely clustering prop- 
erties at least as different as those of Do from the typical cluster- 
ing properties for these models). By contrast, the regions in red 
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Fig. 6. Top: values of the "clustering similarity" f^- for all tested 
pairs of parameters {us, 5), obtained from the analysis of a typical 
data set built from a realization of the reference model - 
(ris - lO^^Mpc"^,^ - 5°), indicated by the white cross. For 
all models considered here, a uniform distribution of the sources 
was assumed. Bottom: confidence contours of 68% (red), 95% 
(light blue) and 99% (dark blue) extracted from the same data 
set £)(). These plots can be interpreted as giving the regions of 
the parameter space that are most compatible with the reference 
data set (see text for details). 



correspond to models which yield data sets with clustering prop- 
erties similar to Dq. On the lower-left of the plot, the data sets 
are typically much more clustered (i.e. show more pairs within 
small angular distances) than ©q. On the upper-right, on the con- 
trary, the typical data sets are much less clustered than £)o, be- 
cause there are more sources contributing, each of which appears 
through an image broadened by larger deflections. This illus- 
trates the power of an analysis that combines both the excesses 
and deficits of clustering of the data set under investigation, with 
respect to data sets produced by different astrophysical models. 
It can do more than simply assessing the anisotropy of the data 
set: it can point towards candidate astrophysical models. 

6.2. Comments on the statistical meaning of tlie "clustering 
similarity" 

To make this statement more precise, from the statistical point of 
view, it is interesting to indicate how confidence intervals can be 
derived for the model parameters. The values of the clustering 



similarity !P-t given in Fig. [6] for each and 6 are essentially p- 
values in a goodness-of-fit approach. This, of course, does not 
give the probability of the different hypotheses. The plot can 
however be read in a Bayesian perspective as giving the posterior 
probability density function (p.d.f.), after normalization over the 
whole parameter space, for an implicit prior p.d.f. which is flat in 
deflection angular scales and flat in the logarithm of the source 
density, in the explored range. 

In a frequentist approach, we can also adapt the Neyman con- 
struction for the confidence intervals to the specific case of our 
clustering analysis. The situation is slightly more subtle than in 
usual cases, because here the "measured observable", which is 
^±,min; is not derived from the data alone, but depends on the 
tested model as well. The actual values measured by the experi- 
ment are the number of pairs within each angular scale 6, that is 
0(9). From this, we obtained a "relatively measured observable" 
^±,min(ns, S) for each tested hypothesis, by which we mean that, 
once a set of parameters (ns,<J) is chosen for the reference data 
sets, !P±_,Tiin can be univocally computed from the data, and is 
thus an observable in its own right. In the above sense, however, 
it is measured relatively to the model (ris, 6). 

The Neyman procedure can nevertheless be applied. For a 
pre-specified confidence level, 1 - a, we need to determine lim- 
iting values !P±,min,i and 'P±,min,2 such that 




/(!P±.mi„;«s,<J)d:P±,™„ = 1 -a, (3) 



where /(!P±,niin; "s, ^) is the density of probability that one par- 
ticular data set computed with the assumption that the source 
density is n^ and the deflection scale is 6, be characterized 
by the value !P±,min with respect to this choice of parameters. 
In other words, /(!P±,min;«s,^) is simply the (normalized) his- 
togram shown in Fig. |5](of which there is one for each point of 
the parameter space). 

Now we can choose to set 'P±_nim,2 = 1, in which case !P±,min,i 
satisfies: 

/(!P±,min;ns,Wi,™i„=ff. (4) 

So, comparing with Eq. (j2]), one sees that !P±,min,i is identical to 
^±,min(£'{)) for all the points in the parameter space that give a 
p-value !P-t precisely equal to a in the plot of Fig. [6] In other 
words, the contour defined by the level curve (i.e. identical color 
contour) with value a in the plot also defines a contour of values 
of 'P±,mm.\ for the particular choice of confidence level \ - a. 

Thus, given the Neyman analysis of the confidence intervals, 
this also means that, by construction, the level contours of the 

plots, such as Fig.|6j are exactly the Neyman frequentist con- 
fidence contours, corresponding to frequencies equal to 1 - !P-t. 
In other words, starting with a data set £)o built from a given as- 
trophysical model {n^,5), we can apply the above procedure to 
draw the clustering similarity 'P± based on this data set, and these 
contours will encircle the actual model from which the data set 
was built in exactly a fraction 1 - of the cases. 

As an illustration, we show in Fig.|6]the confidence contours 
extracted from the same analyzed data set £)o as the one used 
to draw the Bayesian p.d.f. If one repeats the construction of 
these contours over many realizations of the same model (n, = 
10"^ Mpc"^^, 6 — 5°, as indicated by the white cross), then by 
construction the red-orange region will enclose the white cross 
in 68% of the cases. Note that, in Fig.|6] we chose a realization 
where this property was actually fulfilled. 
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log^^{ns[Mpe ^]) 

Fig. 7. Influence of the underlying source distribution on the as- 
trophysical constraints derived from the clustering analysis. The 
confidence contours (see Fig.|6]for color meaning) are extracted 
from the analysis of a typical data set of 50 events above an en- 
ergy threshold of 60 EeV, built from a typical realization of the 
reference model (ris - 10"^ Mpc"^, (5 = 5°), indicated by a cross, 
assuming either an a priori uniform source distribution (on the 
left) or an anisotropic distribution (on the right) similar to that of 
the galaxies (see text). 



6.3. Influence of the parameters 

6.3.1 . Spatial distribution of UHECR sources 

The influence of the source distribution on the ability of a 
UHECR experiment to detect an anisotropy through our clus- 
tering analysis has already been shown in Figs. [3] and |4] On 
Fig. |7j we further show that a prior assumption on the spatial 
distribution of the UHECR sources modifies the astrophysical 
constraints that can be drawn from an anisotropic data set. In 
this example, we choose the same density/deflection parameters 
as above (namely = lO"*" Mpc"-' and 6 = 5°), both with an a 
priori uniform source distribution (on the left) and with a source 
distribution following the 2MRS galaxy distribution, which is 
intrinsically anisotropic (on the right). Clearly, a prior expecta- 
tion regarding the overall source distribution (here similar to the 
galaxies in the nearby universe) helps reducing the range of pa- 



rameters compatible with the clustering properties of the data set 
under investigation. 

6.3.2. Size of tine data set 

In Fig. [8] we show the clustering similarity (!P-t) plots, on the left, 
and the corresponding confidence contour plots, on the right, for 
data sets containing either 50, 250 or 1000 events, again built 
from the same model as above (ws = lO^^Mpc""', S - 5°). The 
two figures at the top are actually identical to those of Fig|6] 
but we plot them again for comparison with those obtained with 
larger statistics. 

Not surprisingly, increasing the size of the data set allows 
one to draw stronger constraints and reduces the region of the 
parameter space that is compatible with the data, with given 
confidence levels. However, it can be seen that increasing the 
data set from 250 to 1000 events only has a moderate impact on 
the potential astrophysical constraints. This is because one point 
of the parameter space may result in very different realizations 
of the source position and distances. To use a language mostly 
used in cosmology, there is a large "cosmic variance" associ- 
ated with the astrophysical models represented by single points 
in the plots, especially for low source densities. Different real- 
izations of the same model may lead to rather dissimilar data 
sets, with respect to their clustering properties. Therefore, even 
though very large statistics will eventually give access to a very 
precise measurement of the actual clustering patterns associated 
with the UHECR sources around us, these patterns may be dis- 
similar to some patterns produced by other source distributions 
with the same source density (and deflection angular scale), and 
similar to patterns produced by a number of source distributions 
with different source densities. 

This illustrates a limit of the method. Since we have cho- 
sen to study the clustering properties of the data through one 
global parameter, f'±, the discrimination power remains some- 
what limited, even with very large statistics. On the other hand, 
for the same reason, the method allows one to derive interest- 
ing constraints even with limited data sets. As Fig. [8] tentatively 
illustrates, this method is thus best adapted to data sets from a 
few tens to a few hundreds of events, which corresponds to the 
current and near future stage of development of UHECR obser- 
vations. 



6.3.3. Energy tliresliold 

The size of a data set depends on the energy threshold used to 
build it. Whatever the total acceptance of a given experiment, 
the data set tends to zero as the energy increases, so there will 
always be an energy range where a method such as the one pre- 
sented here may be relevant. As indicated above, the interest of 
the highest energies is that the distance from which UHECRs can 
reach us is more limited, so that fewer sources contribute, and 
characteristic clustering patterns can be explored. It is therefore 
interesting to investigate the potential of the method for higher 
energy events. 

In Fig.|9] we show the confidence contours of the clustering 
similarity plot for a data set of 50 events, with a minimum en- 
ergy, £■„„■„ - 60 EeV (left) and 80 EeV (right), with the same 
model parameters as above («s = 10"^ Mpc"^). Note that the 
former case is redrawn here from Fig.[6]to help comparison. 

As can be seen, the size of the confidence regions is signif- 
icantly smaller in the 80 EeV case. This is because the horizon 
scale is reduced at higher energy (typically 120 Mpc at 80 EeV 
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Fig. 8. Influence of the size of the data set on the astrophysical constraints derived from the clustering analysis. Left column: 
values of the "clustering similarity", "P-t, for all pairs of parameters (11^,6), obtained from the analysis of data sets with an energy 
threshold /imin - 60 EeV, containing respectively (from top to bottom) 50, 250 and 1000 events, each being a typical realization 
of the reference model Dq = (Ws = lO^^Mpc"^,^ = 5°), indicated by the white cross. For all models considered here, a uniform 
distribution of the sources was assumed. Right column: confidence contours extracted from the same data sets. 



vs. 190 Mpc at 60 EeV), so less sources contribute to the ob- 
served flux. Thus for a given size of the data set, one collects 
more events per source, which results in a larger, thus more con- 
straining clustering signal. 



Of course, the higher the minimum energy considered the 
larger the exposure required to achieve the same size of the 
data set. In practice, there is thus a trade off" between the con- 
straining power gained by concentrating on the highest energies 
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Fig. 9. Effect of the energy threshold Emin on the astrophysical 
constraints derived from the clustering analysis. The confidence 
contours are extracted from the analysis of a data set with 50 
events and an energy threshold of 60 EeV (left) and 80 EeV 
(right), both being a typical realization of the reference model 
£)o = (ris - 10"^ Mpc"-',^ - 5°), indicated by the white cross. 
For all models considered here, a uniform distribution was as- 
sumed. 



and that gained by increasing the size of the data set. Here, we 
do not address this specific question in detail. Nevertheless, for 
the sake of completeness, we show in Fig[TO]the results of the 
analysis corresponding to four additional cases: 50 events above 
60 EeV, 50 events above 80 EeV, 250 events above 60 EeV, and 
250 events above 80 EeV. The reference model was chosen to 
have rig - lO""*'^ Mpc"^ and 6 = 10° (as indicated by the cross 
on the plots), with an underlying source distribution following 
that of the 2MRS galaxy catalog. 



7. Conclusion 

In this paper, we have explored how the usual clustering analy- 
sis of a given UHECR data set could be extended to constrain 
the responsible astrophysical models in a 2D parameter space 
consisting of the effective density of UHECR sources and the 
effective angular deflection of the UHECRs from their source to 
the Earth. 

We first studied the sensitivity of a typical UHECR detector 
to anisotropics and showed that detectors with a moderate accep- 
tance can detect a significant anisotropy signal in a large number 
of astrophysical cases, or, failing that, constrain a large class of 
astrophysical models. 



We then introduced an indicator, loosely referred to as the 
"clustering similarity" of a given data set with respect to sample 
data sets built from a given class of astrophysical models. This 
allowed us to propose an improved method to make the most of 
the clustering analysis based on the two-point correlation func- 
tion. 

We applied this method to artificial data sets, demonstrat- 
ing its statistical power and studying the influence of parameters 
such as the size of the data set, the minimum energy considered 
and the spatial distribution of the UHECR sources. 

Since our method condenses all the information contained 
in the two-point correlation function into one estimator, it can 
be powerful even with small data sets, corresponding to the cur- 
rent experimental situation in the GZK energy range. For larger 
statistics, more refined studies (e.g. searching for energy-ordered 
multiplets) will be complementary and help gather more infor- 
mation about the UHECR sources in the universe. 

Finally, let us note that the whole study assumes that one 
can identify one typical deflection angular scale and one typi- 
cal source density, in other words, that the parameter space that 
we explore is indeed relevant to the astrophysical situation un- 
der investigation. This may not be the case for UHECRs, either 
because of a distribution of sources with a luminosity function 
following a power law, thus with no clear source density, or be- 
cause there are more than one component with different typi- 
cal scales. In that case, however, applying this method to actual 
UHECR data sets still allows one to constrain the phenomeno- 
logical models corresponding to the regions of the parameter 
space that would be in conflict with the clustering analysis. 

Note also that in the case of transient sources (such as 
gamma-ray burst or AGN flares, ), what we call here the source 
density is but the density of sources that can actually contribute 
to the UHECR flux observed at the current time. This effective 
density depends on the occurrence (or repetition) rate of the tran- 
sient events, I/tqcc, compared to the time spread, Afobs, of the 
expected signal (see e.g. Murase & Takami, 2009 for the impli- 
cations of UHECRs data on such transient sources). The latter 
happens to be related to the deflection scale, but in any case, the 
effective (instantaneous) density of sources can be obtained as 
"s.eff = "s X Afobs/^occ- This is the source density that should 
appear in the x-axis of our contours plots. 
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Fig. 10. Values of "clustering similarity", !P±, (top four figures) and confidence contours (bottom four figures) for all (ns,6) ob- 
tained with the analysis of 4 data sets with different assumptions, each is a typical realization of the reference model T)Q-{n^ - 
Mpc-^5 = 10°) indicated by a black cross. Rightwards and downwards, the number of events and energy threshold in EeV 
are: (N,E^in) = (50, 60); (250,60); (50, 80); (250, 80). The source distribution follows the galaxies and the acceptance mimics the 
coverage of the Pierre Auger Observatory. 
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